An anti-spam filter based on one-class IB method in small training sets

نویسندگان

  • Chen Yang
  • Shaofeng Zhao
  • Dan Zhang
  • Junxia Ma
چکیده

We present an approach to email filtering based on one-class Information Bottleneck (IB) method in small training sets. When themes of emails are changing continually, the available training set which is high-relevant to the current theme will be small. Hence, we further show how to estimate the learning algorithm and how to filter the spam in the small training sets. First, In order to preserve classification accuracy and avoid over-fitting while substantially reducing training set size, we consider the learning framework as the solution of one-class centroid only averaged by highly positive emails, and second, we design a simple binary classification model to filters spam by the comparison of similarity between emails and centroids. Experimental results show that in small training sets our method can significantly improve classification accuracy compared with the currently popular methods, such as: Naive Bayes, AdaBoost and SVM.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه روشی مناسب برای دسته بندی نامه های الکترونیکی تبلیغاتی بر مبنای پروفایل کاربران

In general, Spam is related to satisfy or not satisfy the client and isn’t related to the content of the client’s email. According to this definition, problems arise in the field of marketing and advertising for example, it is possible that some of the advertising emails become spam for some users, and not spam for others. To deal with this problem, many researchers design an anti-s...

متن کامل

An Effective Model for SMS Spam Detection Using Content-based Features and Averaged Neural Network

In recent years, there has been considerable interest among people to use short message service (SMS) as one of the essential and straightforward communications services on mobile devices. The increased popularity of this service also increased the number of mobile devices attacks such as SMS spam messages. SMS spam messages constitute a real problem to mobile subscribers; this worries telecomm...

متن کامل

A hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts

High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...

متن کامل

Not So Naive Online Bayesian Spam Filter

Spam filtering, as a key problem in electronic communication, has drawn significant attention due to increasingly huge amounts of junk email on the Internet. Content-based filtering is one reliable method in combating with spammers changing tactics. Naı̈ve Bayes (NB) is one of the earliest content-based machine learning methods both in theory and practice in combating with spammers, which is eas...

متن کامل

An Anti-spam Filter Combination Framework for Text-and-Image Emails through Incremental Learning

We present an anti-spam filtering framework that combines text-based and image-based anti-spam filters. First, an incremental learning approach to reducing mismatches between training and test datasets is proposed to resolve the problem of a lack of training data for legitimate emails that contain both text and images. Then, the outputs of text-based and image-based filters are combined with th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. Arab J. Inf. Technol.

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2016